26 research outputs found

    Predicated execution and register windows for out-of-order processors

    Get PDF
    ISA extensions are a very powerful approach to implement new hardware techniques that require or benefit from compiler support: decisions made at compile time can be complemented at runtime, achieving a synergistic effect between the compiler and the processor. This thesis is focused on two ISA extensions: predicate execution and register windows. Predicate execution is exploited by the if-conversion compiler technique. If-conversion removes control dependences by transforming them to data dependences, which helps to exploit ILP beyond a single basic-block. Register windows help to reduce the amount of loads and stores required to save and restore registers across procedure calls by storing multiple contexts into a large architectural register file.In-order processors specially benefit from using both ISA extensions to overcome the limitations that control dependences and memory hierarchy impose on static scheduling. Predicate execution allows to move control dependence instructions past branches. Register windows reduce the amount of memory operations across procedure calls. Although if-conversion and register windows techniques have not been exclusively developed for in-order processors, their use for out-of-order processors has been studied very little. In this thesis we show that the uses of if-conversion and register windows introduce new performance opportunities and new challenges to face in out-of-order processors.The use of if-conversion in out-of-order processors helps to eliminate hard-to-predict branches, alleviating the severe performance penalties caused by branch mispredictions. However, the removal of some conditional branches by if-conversion may adversely affect the predictability of the remaining branches, because it may reduce the amount of correlation information available to the branch predictor. Moreover, predicate execution in out-of-order processors has to deal with two performance issues. First, multiple definitions of the same logical register can be merged into a single control flow, where each definition is guarded with a different predicate. Second, instructions whose guarding predicate evaluates to false consume unnecessary resources. This thesis proposes a branch prediction scheme based on predicate prediction that solves the three problems mentioned above. This scheme, which is built on top of a predicated ISA that implement a compare-and-branch model such as the one considered in this thesis, has two advantages: First, the branch accuracy is improved because the correlation information is not lost after if-conversion and the mechanism we propose permits using the computed value of the branch predicate when available, achieving 100% of accuracy. Second it avoids the predicate out-of-order execution problems.Regarding register windows, we propose a mechanism that reduces physical register requirements of an out-of-order processor to the bare minimum with almost no performance loss. The mechanism is based on identifying which architectural registers are in use by current in-flight instructions. The registers which are not in use, i.e. there is no in-flight instruction that references them, can be early released.In this thesis we propose a very efficient and low-cost hardware implementation of predicate execution and register windows that provide important benefits to out-of-order processors

    Model-free reinforcement learning with a non-linear reconstructor for closed-loop adaptive optics control with a pyramid wavefront sensor

    Get PDF
    We present a model-free reinforcement learning (RL) predictive model with a supervised learning non-linear reconstructor for adaptive optics (AO) control with a pyramid wavefront sensor (P-WFS). First, we analyse the additional problems of training an RL control method with a P-WFS compared to the Shack-Hartmann WFS. From those observations, we propose our solution: a combination of model-free RL for prediction with a non-linear reconstructor based on neural networks with a U-net architecture. We test the proposed method in simulation of closed-loop AO for an 8m telescope equipped with a 32x32 P-WFS and observe that both the predictive and non-linear reconstruction add additional benefits over an optimised integrator.This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 873120.Peer ReviewedPostprint (author's final draft

    Heuristic-based task-to-thread mapping in multi-core processors

    Get PDF
    OpenMP can be used in real-time applications to enhance system performance. However, predictability of OpenMP applications is still a challenge. This paper investigates heuristics for the mapping of OpenMP task graphs in underlying threads, for the development of time-predictable OpenMP programs. These approaches are based on a global scheduling queue, as well as per-thread allocation queues. The proposed method is divided into scheduling and allocation phases. In the former phase, OpenMP task-parts are discovered from OpenMP graph and placed in the scheduling queue. Afterwards, an appropriate allocation queue is selected for each task-part using four heuristic algorithms. In the latter phase, the best task-part is selected from the allocation queue to be allocated to and executed by an idle thread. Preliminary simulation results show that the new method overcomes BFS and WFS in terms of scheduling time and idle time.This work has been co-funded by the European commission through the AMPERE (H2020 grant agreement N° 745601) project.Peer ReviewedPostprint (author's final draft

    An elastic software architecture for extreme-scale big data analytics

    Get PDF
    This chapter describes a software architecture for processing big-data analytics considering the complete compute continuum, from the edge to the cloud. The new generation of smart systems requires processing a vast amount of diverse information from distributed data sources. The software architecture presented in this chapter addresses two main challenges. On the one hand, a new elasticity concept enables smart systems to satisfy the performance requirements of extreme-scale analytics workloads. By extending the elasticity concept (known at cloud side) across the compute continuum in a fog computing environment, combined with the usage of advanced heterogeneous hardware architectures at the edge side, the capabilities of the extreme-scale analytics can significantly increase, integrating both responsive data-in-motion and latent data-at-rest analytics into a single solution. On the other hand, the software architecture also focuses on the fulfilment of the non-functional properties inherited from smart systems, such as real-time, energy-efficiency, communication quality and security, that are of paramount importance for many application domains such as smart cities, smart mobility and smart manufacturing.The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the ELASTIC Project (www.elastic-project.eu), grant agreement No 825473.Peer ReviewedPostprint (published version

    Simulación de los requerimientos hídricos de pasturas en un escenario de cambios climáticos generados con análisis espectral singular

    Get PDF
    Aunque los modelos climáticos globales muestran que cada vez son más altas las tasas de cambio de la temperatura y la precipitación, no proporcionan la información necesaria para que los actores del sistema producción-consumo puedan definir las medidas de adaptación a escala local. En el presente trabajo se demuestra de qué manera se pueden generar escenarios de cambio climático más locales, basados en el análisis de las tendencias de las series de tiempo de los elementos del clima. Se plantea la prueba de Mann-Kendall para determinar si existe tendencia en las series temporales de dos elementos climáticos y el análisis espectral singular para generar escenarios futuros de las variables climáticas analizadas. Los resultados mostraron que en dos municipios de Colombia (El Espinal, Tolima, y Mosquera, Cundinamarca) los pastos pará (Brachiaria mutica) y kikuyo (Pennisetum clandestinum), respectivamente, tienden a requerir más agua por aumento de la evapotranspiración, sin embargo, es la textura del suelo la que determina cambios drásticos entre escenarios climáticos, en términos de requerimientos de riego

    AI-powered edge computing evolution for beyond 5G communication networks

    Get PDF
    Edge computing is a key enabling technology that is expected to play a crucial role in beyond 5G (B5G) and 6G communication networks. By bringing computation closer to where the data is generated, and leveraging Artificial Intelligence (AI) capabilities for advanced automation and orchestration, edge computing can enable a wide range of emerging applications with extreme requirements in terms of latency and computation, across multiple vertical domains. In this context, this paper first discusses the key technological challenges for the seamless integration of edge computing within B5G/6G and then presents a roadmap for the edge computing evolution, proposing a novel design approach for an open, intelligent, trustworthy, and distributed edge architecture.VERGE has received funding from the Smart Networks and Services Joint Undertaking (SNS JU) under the European Union’s Horizon Europe research and innovation programme under Grant Agreement No 101096034.Peer ReviewedPostprint (author's final draft

    Gestión del conocimiento. Perspectiva multidisciplinaria. Volumen 17

    Get PDF
    El libro “Gestión del Conocimiento. Perspectiva Multidisciplinaria”, Volumen 17 de la Colección Unión Global, es resultado de investigaciones. Los capítulos del libro, son resultados de investigaciones desarrolladas por sus autores. El libro es una publicación internacional, seriada, continua, arbitrada, de acceso abierto a todas las áreas del conocimiento, orientada a contribuir con procesos de gestión del conocimiento científico, tecnológico y humanístico. Con esta colección, se aspira contribuir con el cultivo, la comprensión, la recopilación y la apropiación social del conocimiento en cuanto a patrimonio intangible de la humanidad, con el propósito de hacer aportes con la transformación de las relaciones socioculturales que sustentan la construcción social de los saberes y su reconocimiento como bien público

    Enhancing OpenMP tasking model: performance and portability

    Get PDF
    OpenMP, as the de-facto standard programming model in symmetric multiprocessing for HPC, has seen its performance boosted continuously by the community, either through implementation enhancements or specification augmentations. Furthermore, the language has evolved from a prescriptive nature, as defined by the thread-centric model, to a descriptive behavior, as defined by the task-centric model. However, the overhead related to the orchestration of tasks is still relatively high. Applications exploiting very fine-grained parallelism and systems with a large number of cores available might fail on scaling. In this work, we propose to include the concept of Task Dependency Graph (TDG) in the specification by introducing a new clause, named taskgraph, attached to task or target directives. By design, the TDG allows alleviating the overhead associated with the OpenMP tasking model, and it also facilitates linking OpenMP with other programming models that support task parallelism. According to our experiments, a GCC implementation of the taskgraph is able to significantly reduce the execution time of fine-grained task applications and increase their scalability with regard to the number of threads.This work has been supported by the EU H2020 project AMPERE under the grant agreement no. 871669.Peer ReviewedPostprint (author's final draft
    corecore